Pseudo-convergent Q-Learning by Competitive Pricebots
نویسندگان
چکیده
We study novel aspects of multi-agent Q-learning in a model market in which two identical, competing \pricebots" strategically price a commodity. Two fundamentally diierent solutions are observed: an exact, stationary solution with zero Bellman error consisting of symmetric policies, and a non-stationary, broken-symmetry pseudo-solution, with small but non-zero Bellman error. This \pseudo-convergent" asymmet-ric solution has no analog in ordinary Q-learning. We calculate analytically the form of both solutions, and map out numerically the conditions under which each occurs. We suggest that this observed behavior will also be found more generally in other studies of multi-agent Q-learning, and discuss implications and directions for future research.
منابع مشابه
Coco-Q: Learning in Stochastic Games with Side Payments
Coco (“cooperative/competitive”) values are a solution concept for two-player normalform games with transferable utility, when binding agreements and side payments between players are possible. In this paper, we show that coco values can also be defined for stochastic games and can be learned using a simple variant of Q-learning that is provably convergent. We provide a set of examples showing ...
متن کاملShopbots and Pricebots
Shopbots are software agents that automatically gather and collate information from multiple on-line vendors about the price and quality of consumer goods and services. Rapidly increasing in number and sophistication, shopbots are helping more and more buyers minimize expenditure and maximize satisfaction. In response to this trend, it is anticipated that sellers will come to rely on pricebots,...
متن کاملAn Online Convergent Q-learning Algorithm with Linear Function Approximation
We present in this article a variant of Q-learning with linear function approximation that is based on two-timescale stochastic approximation. Whereas it is difficult to prove convergence of regular Q-learning with linear function approximation because of the off-policy problem, we prove that our algorithm is convergent. Numerical results on a multi-stage stochastic shortest path problem show t...
متن کاملJoint Action Learners in Competitive Stochastic Games
This thesis investigates the design of adaptive utility maximizing software agents for competitive multi-agent settings. The focus is on evaluating the theoretical and empirical performance of Joint Action Learners (JALs) in settings modeled as stochastic games. JALs extend the well-studied Q-learning algorithm. A previously introduced JAL optimizes with respect to stationary or convergent oppo...
متن کاملساخت و اعتباریابی مقیاسی برای سنجش فرایند یادگیری سازمانی
Organizational learning is the process that updates and changes organizational shared mental models that in turn results in competitive advantage, profitability growth and ultimately organizational performance development by acquiring data, using information, creating and institutionalizing knowledge within organization. In the other words, organizational learning essentially aims the organizat...
متن کامل